224 research outputs found
Efficient Normal-Form Parsing for Combinatory Categorial Grammar
Under categorial grammars that have powerful rules like composition, a simple
n-word sentence can have exponentially many parses. Generating all parses is
inefficient and obscures whatever true semantic ambiguities are in the input.
This paper addresses the problem for a fairly general form of Combinatory
Categorial Grammar, by means of an efficient, correct, and easy to implement
normal-form parsing technique. The parser is proved to find exactly one parse
in each semantic equivalence class of allowable parses; that is, spurious
ambiguity (as carefully defined) is shown to be both safely and completely
eliminated.Comment: 8 pages, LaTeX packaged with three .sty files, also uses cgloss4e.st
Three New Probabilistic Models for Dependency Parsing: An Exploration
After presenting a novel O(n^3) parsing algorithm for dependency grammar, we
develop three contrasting ways to stochasticize it. We propose (a) a lexical
affinity model where words struggle to modify each other, (b) a sense tagging
model where words fluctuate randomly in their selectional preferences, and (c)
a generative model where the speaker fleshes out each word's syntactic and
conceptual structure without regard to the implications for the hearer. We also
give preliminary empirical results from evaluating the three models' parsing
performance on annotated Wall Street Journal training text (derived from the
Penn Treebank). In these results, the generative (i.e., top-down) model
performs significantly better than the others, and does about equally well at
assigning part-of-speech tags.Comment: 6 pages, LaTeX 2.09 packaged with 4 .eps files, also uses colap.sty
and acl.bs
A Deep Generative Model of Vowel Formant Typology
What makes some types of languages more probable than others? For instance,
we know that almost all spoken languages contain the vowel phoneme /i/; why
should that be? The field of linguistic typology seeks to answer these
questions and, thereby, divine the mechanisms that underlie human language. In
our work, we tackle the problem of vowel system typology, i.e., we propose a
generative probability model of which vowels a language contains. In contrast
to previous work, we work directly with the acoustic information -- the first
two formant values -- rather than modeling discrete sets of phonemic symbols
(IPA). We develop a novel generative probability model and report results based
on a corpus of 233 languages.Comment: NAACL 201
Recommended from our members
Predicting Fine-Grained Syntactic Typology from Surface Features
We show how to predict the basic word-order facts of a novel language given only a corpus of its part-of-speech (POS) sequences. We predict how often direct objects follow their verbs, how often adjectives follow their nouns, and in general the directionalities of all dependency relations. Although recovering syntactic structure is usually regarded as unsupervised learning, we train our predictor on languages of known structure. It outperforms the state-of-the-art unsupervised learning by a large margin, especially when we augment the training data with many synthetic languages. Full details can be found in http://www.cs.jhu.edu/~jason/papers/#wang-eisner-2017
- …